The Role Of Hubness in High-dimensional Data Analysis
نویسنده
چکیده
Machine learning in intrinsically high-dimensional data is known to be challenging and this is usually referred to as the curse of dimensionality. Designing machine learning methods that perform well in many dimensions is critical, since highdimensional data arises often in practical applications and typical examples include textual, image and multimedia feature representations, as well as time series and biomedical data. The hubness phenomenon [1] has recently come into focus as an important aspect of the curse of dimensionality that affects many instance-based machine learning systems. With increasing dimensionality, the distribution of instance relevance within the models tends to become longtailed. A small number of hub points dominates the analysis and influences a disproportionate number of system predictions. Most remaining points are rarely or never retrieved in relevance queries, resulting in an information loss. High data hubness has been linked to poor system performance in many data domains. The dissertation [2] proposes several novel hubness-aware machine learning algorithms to improve the effectiveness of machine learning in intrinsically high-dimensional data. The proposed
منابع مشابه
A Study on Clustering High Dimensional Data Using Hubness Phenomenon
Data mining is the non-trivial process of extracting information from the very large database. In recent years, data repository has a high dimensional data, which makes a complete search in most of the data mining problems leads computationally infeasible. To eradicate this problem clustering plays a vital role in handling low dimensional data and high dimensional data. Low dimensional data mak...
متن کاملClustering with Shared Nearest Neighbor-unscented Transform Based Estimation
Subspace clustering developed from the group of cluster objects in all subspaces of a dataset. When clustering high dimensional objects, the accuracy and efficiency of traditional clustering algorithms are very poor, because data objects may belong to diverse clusters in different subspaces comprised of different combinations of dimensions. To overcome the above issue, we are going to implement...
متن کاملHub Co-occurrence Modeling for Robust High-Dimensional kNN Classification
The emergence of hubs in k-nearest neighbor (kNN) topologies of intrinsically high dimensional data has recently been shown to be quite detrimental to many standard machine learning tasks, including classification. Robust hubness-aware learning methods are required in order to overcome the impact of the highly uneven distribution of influence. In this paper, we have adapted the Hidden Naive Bay...
متن کاملAn Improved Unsupervised Cluster based Hubness Technique for Outlier Detection in High dimensional data
Outlier detection in high dimensional data becomes an emerging technique in today’s research in the area of data mining. It tries to find entities that are considerably unrelated, unique and inconsistent with respect to the common data in an input database. It faces various challenges because of the increase of dimensionality. Hubness has recently been developed as an important concept and acts...
متن کاملClass imbalance and the curse of minority hubs
Most machine learning tasks involve learning from high-dimensional data, which is often quite difficult to handle. Hubness is an aspect of the curse of dimensionality that was shown to be highly detrimental to k-nearest neighbor methods in high-dimensional feature spaces. Hubs, very frequent nearest neighbors, emerge as centers of influence within the data and often act as semantic singularitie...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Informatica (Slovenia)
دوره 38 شماره
صفحات -
تاریخ انتشار 2014